85 research outputs found

    Simulation of genome-wide evolution under heterogeneous substitution models and complex multispecies coalescent histories

    Get PDF
    Genomic evolution can be highly heterogeneous. Here, we introduce a new framework to simulate genome-wide sequence evolution under a variety of substitution models that may change along the genome and the phylogeny, following complex multispecies coalescent histories that can include recombination, demographics, longitudinal sampling, population subdivision/species history, and migration. A key aspect of our simulation strategy is that the heterogeneity of the whole evolutionary process can be parameterized according to statistical prior distributions specified by the user. We used this framework to carry out a study of the impact of variable codon frequencies across genomic regions on the estimation of the genome-wide nonsynonymous/synonymous ratio. We found that both variable codon frequencies across genes and rate variation among sites and regions can lead to severe underestimation of the global dN/dS values. The program SGWE—Simulation of Genome-Wide Evolution—is freely available from http://code.google.com/p/sgwe-project/, including extensive documentation and detailed examples.Ministerio de Ciencia e Innovación | Ref. JCI-2011-1045

    CellPhy: accurate and fast probabilistic inference of single-cell phylogenies from scDNA-seq data

    Get PDF
    We introduce CellPhy, a maximum likelihood framework for inferring phylogenetic trees from somatic single-cell single-nucleotide variants. CellPhy leverages a finite-site Markov genotype model with 16 diploid states and considers amplification error and allelic dropout. We implement CellPhy into RAxML-NG, a widely used phylogenetic inference package that provides statistical confidence measurements and scales well on large datasets with hundreds or thousands of cells. Comprehensive simulations suggest that CellPhy is more robust to single-cell genomics errors and outperforms state-of-the-art methods under realistic scenarios, both in accuracy and speed.European Research Council | Ref. ERC-617457- PHYLOCANCERAgencia Estatal de Investigación | Ref. PID2019-106247GB-I00Fundação para a Ciência e a Tecnologia | Ref. PTDC/BIA-EVL/32030/2017Xunta de Galici

    Protein evolution along phylogenetic histories under structurally constrained substitution models

    Get PDF
    Motivation: Models of molecular evolution aim at describing the evolutionary processes at the molecular level. However, current models rarely incorporate information from protein structure. Conversely, structure-based models of protein evolution have not been commonly applied to simulate sequence evolution in a phylogenetic framework, and they often ignore relevant evolutionary processes such as recombination. A simulation evolutionary framework that integrates substitution models that account for protein structure stability should be able to generate more realistic in silico evolved proteins for a variety of purposes. Results: We developed a method to simulate protein evolution that combines models of protein folding stability, such that the fitness depends on the stability of the native state both with respect to unfolding and misfolding, with phylogenetic histories that can be either specified by the user or simulated with the coalescent under complex evolutionary scenarios, including recombination, demographics and migration. We have implemented this framework in a computer program called ProteinEvolver. Remarkably, comparing these models with empirical amino acid replacement models, we found that the former produce amino acid distributions closer to distributions observed in real protein families, and proteins that are predicted to be more stable. Therefore, we conclude that evolutionary models that consider protein stability and realistic evolutionary histories constitute a better approximation of the real evolutionary process.Ministerio de Ciencia e Innovación | Ref. BFU2011-24595Ministerio de Economía y Competitividad | Ref. BFU2012-40020Ministerio de Ciencia e Innovación | Ref. JCI-2011-1045

    CodABC: a computational framework to coestimate recombination, substitution, and molecular adaptation rates by approximate Bayesian computation

    Get PDF
    The estimation of substitution and recombination rates can provide important insights into the molecular evolution of protein-coding sequences. Here, we present a new computational framework, called CodABC, to jointly estimate recombination, substitution and synonymous and non-synonymous rates from coding data. CodABC uses approximate Bayesian computation (ABC) with and without regression adjustment and implements a variety of codon models, intracodon recombination and longitudinal sampling. CodABC can provide accurate joint parameter estimates from recombining coding sequences, often outperforming maximum likelihood methods based on more approximate models. In addition, CodABC allows for the inclusion of several nuisance parameters such as those representing codon frequencies, transition matrices, heterogeneity across sites or invariable sites. CodABC is freely available from http://code.google.com/p/codabc/, includes a GUI, extensive documentation and ready-touse examples, and can run in parallel on multicore machines.Ministerio de Ciencia e Innovación | Ref. JCI-2011-10452Fundação para a Ciência e a Tecnologia | Ref. EXCL/BIA-ANM/0549/201

    Rapid evolution and biogeographic spread in a colorectal cancer

    Get PDF
    How and when tumoral clones start spreading to surrounding and distant tissues is currently unclear. Here we leveraged a model-based evolutionary framework to investigate the demographic and biogeographic history of a colorectal cancer. Our analyses strongly support an early monoclonal metastatic colonization, followed by a rapid population expansion at both primary and secondary sites. Moreover, we infer a hematogenous metastatic spread under positive selection, plus the return of some tumoral cells from the liver back to the colon lymph nodes. This study illustrates how sophisticated techniques typical of organismal evolution can provide a detailed, quantitative picture of the complex tumoral dynamics over time and spaceEuropean Research Council | Ref. ERC-617457- PHYLOCANCERMinisterio de Economía y Competitividad | Ref. BFU2015-63774-PInstituto de Salud Carlos III | Ref. PI15/01501-FEDE

    Transcriptomic landscape of the kleptoplastic sea slug Elysia viridis

    Get PDF
    The longevity of kleptoplasts and sea slugs during starvation may be mediated by multiple factors, including the recognition of the plastids during feeding by multiple receptors (e.g. PRRs, CTLRs and SRs) and ROS-quenching proteins using enzymatic and nonenzymatic mechanisms. In particular, in the transcriptome of E. viridis we found that the presence of CDSs corresponds to multiple PRRs that may be involved in the plastid-recognition process; this is despite the fact that this species has a low receptor richness in comparison with other elysoids (Melo Clavijo et al., 2020). In addition, we also detected multiple enzymatic families involved in the ROS-quenching response. In contrast, the production of antioxidant compounds may contribute in only a minor way to the control of oxidative stress. A further enriched GO category in species that sequester chloroplasts corresponded to G protein-coupled receptors, which suggests that these receptors may be required for plastid recognition in Sacoglossan sea slugs, paralleling their role in other symbioses, such as the mutualism between cnidarians and dinoflagellates (Rosset et al., 2020). Sacoglossan sea slugs may also require the presence of iron ions to reduce the oxidative stress generated after plastid acquisition. All this evidence, derived from the transcriptome analysis of E. viridis, sheds interesting new light on the possible mechanisms used by sea slugs to recognize and establish kleptoplasts within their bodies.Xunta de Galicia | Ref. GPC2014/067Xunta de Galicia | Ref. ED481A-2018/30

    Phylovar: toward scalable phylogeny-aware inference of single-nucleotide variations from single-cell DNA sequencing data

    Get PDF
    Motivation: Single-nucleotide variants (SNVs) are the most common variations in the human genome. Recently developed methods for SNV detection from single-cell DNA sequencing data, such as SCI and scVILP, leverage the evolutionary history of the cells to overcome the technical errors associated with single-cell sequencing protocols. Despite being accurate, these methods are not scalable to the extensive genomic breadth of single-cell whole-genome (scWGS) and whole-exome sequencing (scWES) data. Results: Here, we report on a new scalable method, Phylovar, which extends the phylogeny-guided variant calling approach to sequencing datasets containing millions of loci. Through benchmarking on simulated datasets under different settings, we show that, Phylovar outperforms SCI in terms of running time while being more accurate than Monovar (which is not phylogeny-aware) in terms of SNV detection. Furthermore, we applied Phylovar to two real biological datasets: an scWES triple-negative breast cancer data consisting of 32 cells and 3375 loci as well as an scWGS data of neuron cells from a normal human brain containing 16 cells and approximately 2.5 million loci. For the cancer data, Phylovar detected somatic SNVs with high or moderate functional impact that were also supported by bulk sequencing dataset and for the neuron dataset, Phylovar identified 5745 SNVs with non-synonymous effects some of which were associated with neurodegenerative diseases. Availability and implementation: Phylovar is implemented in Python and is publicly available at https://github.com/NakhlehLab/Phylovar.National Science Foundation | Ref. IIS-1812822National Science Foundation | Ref. IIS-210683

    Joint analysis of species and genetic variation to quantify the role of dispersal and environmental constraints in community turnover

    Get PDF
    Spatial turnover of biological communities is determined by both dispersal and environmental constraints. However, we lack quantitative predictions about how these factors interact and influence turnover across genealogical scales. In this study, we have implemented a predictive framework based on approximate Bayesian computation (ABC) to quantify the signature of dispersal and environmental constraints in community turnover. First, we simulated the distribution of haplotypes, intra‐specific lineages and species in biological communities under different strengths of dispersal and environmental constraints. Our simulations show that spatial turnover rate is invariant across genealogical scales when dispersal limitation determines the species ranges. However, when environmental constraint limits species ranges, spatial turnover rates vary across genealogical scales. These simulations were used in an ABC framework to quantify the role of dispersal and environmental constraints in 16 empirical biological communities sampled from local to continental scales, including several groups of insects (both aquatic and terrestrial), molluscs and bats. In seven datasets, the observed genealogical invariance of spatial turnover, assessed with distance–decay curves, suggests a dispersal‐limited scenario. In the remaining datasets, the variance in distance–decay curves across genealogical scales was best explained by various combinations of dispersal and environmental constraints. Our study illustrates how modelling spatial turnover at multiple genealogical scales (species and intraspecific lineages) provides relevant insights into the relative role of dispersal and environmental constraints in community turnover.Agencia Estatal de Investigación | Ref. CGL2016-76637-PAgencia Estatal de Investigación | Ref. PID2020-112935GB-I00Agencia Estatal de Investigación | Ref. PGC2018- 099363-B-I00Ministerio de Economía y Competitividad | Ref. RYC-2015-18241Xunta de Galici

    Clonality and timing of relapsing colorectal cancer metastasis revealed through whole-genome single-cell sequencing

    Get PDF
    Financiado para publicación en acceso aberto: Universidade de Vigo/CISUGRecurrence of tumor cells following local and systemic therapy is a significant hurdle in cancer. Most patients with metastatic colorectal cancer (mCRC) will relapse, despite resection of the metastatic lesions. A better understanding of the evolutionary history of recurrent lesions is required to identify the spatial and temporal patterns of metastatic progression and expose the genetic and evolutionary determinants of therapeutic resistance. With this goal in mind, here we leveraged a unique single-cell whole-genome sequencing dataset from recurrent hepatic lesions of an mCRC patient. Our phylogenetic analysis confirms that the treatment induced a severe demographic bottleneck in the liver metastasis but also that a previously diverged lineage survived this surgery, possibly after migration to a different site in the liver. This lineage evolved very slowly for two years under adjuvant drug therapy and diversified again in a very short period. We identified several non-silent mutations specific to this lineage and inferred a substantial contribution of chemotherapy to the overall, genome-wide mutational burden. All in all, our study suggests that mCRC subclones can migrate locally and evade resection, keep evolving despite rounds of chemotherapy, and re-expand explosively.Ministerio de Ciencia e Innovación | Ref. PID2019-106247GB-I00AXA Research FundAsociación Española Contra el CáncerXunta de Galicia | Ref. ED481A-2018/30
    corecore